Rate of Convergence and Error Bounds for LSTD($\lambda$)

نویسندگان

  • Manel Tagorti
  • Bruno Scherrer
چکیده

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ ∈ (0, 1), a high-probability estimate of the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2010) in the specific case where λ = 0. In particular, our analysis sheds some light on the choice of λ with respect to the quality of the chosen linear space and the number of samples, that complies with simulations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Rate of Convergence and Error Bounds for LSTD(\(\lambda\))

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We d...

متن کامل

Approximate Policy Iteration: A Survey and Some New Methods

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced polic...

متن کامل

Convergence analysis of the global FOM and GMRES methods for solving matrix equations $AXB=C$ with SPD coefficients

In this paper‎, ‎we study convergence behavior of the global FOM (Gl-FOM) and global GMRES (Gl-GMRES) methods for solving the matrix equation $AXB=C$ where $A$ and $B$ are symmetric positive definite (SPD)‎. ‎We present some new theoretical results of these methods such as computable exact expressions and upper bounds for the norm of the error and residual‎. ‎In particular‎, ‎the obtained upper...

متن کامل

LSTD with Random Projections

We consider the problem of reinforcement learning in high-dimensional spaces when the number of features is bigger than the number of samples. In particular, we study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a highdimensional space. We provide a thorough theoretical analysis of the LSTD with random p...

متن کامل

Convergence rate analysis and error bounds for projection algorithms in convex feasibility problems

Convergence rate analysis and error bounds for projection algorithms in convex feasibility problems Amir Beck & Marc Teboulle To cite this article: Amir Beck & Marc Teboulle (2003) Convergence rate analysis and error bounds for projection algorithms in convex feasibility problems, Optimization Methods and Software, 18:4, 377-394, DOI: 10.1080/10556780310001604977 To link to this article: http:/...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014